On cherche à étudier l’effet de trois facteurs sur le transcriptome des racines d’Arabidopsis thaliana et de la micro Tomate.
## corrplot 0.84 loaded
load(paste0("./GenesCO2_", specie, ".RData"))
load("./normalized.count_At.RData")
# quantification file
data <- read.csv("quantifFiles/QuantifGenes.csv", h = T, sep = ",")
rownames(data) <- data$Gene
genes = which(!(grepl("__", rownames(data))))
not_quant = data[which((grepl("__", rownames(data)))), ]
data = data[genes, grepl("R", colnames(data))]
keep <- rowSums(data) >= 10
data <- data[keep, ]
group <- sapply(colnames(data), getLabel, with.rep = F)
colnames(data) <- sapply(colnames(data), getLabel)
specie = "At"
clusteredGenes <- clustering(sharedBy3, data)****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.90653064844082e-08"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 9.85664883046411e-11"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 2.38742359215394e-12"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.70530256582424e-13"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 6.67910171614494e-13"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.1726698242428e-05"
[1] "Log-like diff: 4.43566960939279e-07"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 8.95150265023403e-09"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -740101.6
*************************************************
Number of clusters = 12
ICL = -740101.6
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
11 8 20 14 14 4 18
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
12 22 5 1 2
Number of observations with MAP > 0.90 (% of total):
131 (100%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
11 8 20 14 14 4 18
(100%) (100%) (100%) (100%) (100%) (100%) (100%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
12 22 5 1 2
(100%) (100%) (100%) (100%) (100%)
a$cluster <- clusteredGenes[a$ensembl_gene_id]
entrezID <- list()
nb_clust = max(clusteredGenes)
for (clust in seq(1:nb_clust)) {
# print(entrez[entrez$cluster == clust,]$ensembl_transcript_id)
entrezID[[length(entrezID) + 1]] <- na.omit(a[a$cluster == clust, ]$entrezgene_id)
}
names(entrezID) <- as.character(seq(1:nb_clust))
ck <- compareCluster(geneCluster = entrezID, fun = "enrichGO", OrgDb = org.At.tair.db,
ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)
clusterProfiler::dotplot(ck, x = ~Cluster)# On essaie un autre clustering avec lka librarie MPLN mpln (dataset =
# as.matrix(data[sharedBy3,])) beaucoup trop long, même mutlithreadé, c'est n'impModel-Based Clustering Using MPLN (Parallelized) Description Performs clustering using mixtures of multivariate Poisson-log normal (MPLN) distribution and model selection using AIC, AIC3, BIC and ICL. Since each component/cluster size (G) is independent from another, all Gs in the range to be tested have been parallelized to run on a seperate core using the parallel R package.
Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
17.2861 4.7060 0.6320 0.5028 0.2615
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
72.025 19.608 2.633 2.095 1.089
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
72.03 91.63 94.27 96.36 97.45
(Only 5 dimensions (out of 24) are shown)
NULL
On essaie avec une ACP basée sur modèle Poisson Log Normal, de PNLModels :
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
DEGenes <- sharedBy3
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.97
- Best model (greater ICL): rank = 10 - R2 = 0.97
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)log.data <- log2(normalized.count[sharedBy3, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
Node_nw_st <- netStats(g)
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.677609
sparsifying penalty = 1.549559
sparsifying penalty = 1.431282
sparsifying penalty = 1.322034
sparsifying penalty = 1.221124
sparsifying penalty = 1.127917
sparsifying penalty = 1.041824
sparsifying penalty = 0.9623023
sparsifying penalty = 0.8888506
sparsifying penalty = 0.8210054
sparsifying penalty = 0.7583388
sparsifying penalty = 0.7004554
sparsifying penalty = 0.6469903
sparsifying penalty = 0.597606
sparsifying penalty = 0.5519913
sparsifying penalty = 0.5098583
sparsifying penalty = 0.4709412
sparsifying penalty = 0.4349947
sparsifying penalty = 0.4017919
sparsifying penalty = 0.3711235
sparsifying penalty = 0.3427959
sparsifying penalty = 0.3166306
sparsifying penalty = 0.2924625
sparsifying penalty = 0.2701391
sparsifying penalty = 0.2495196
sparsifying penalty = 0.230474
sparsifying penalty = 0.2128821
sparsifying penalty = 0.196633
sparsifying penalty = 0.1816242
sparsifying penalty = 0.1677609
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net_norm)$color <- clusteredGenes[V(net_norm)]
plot.igraph(net_norm, vertex.size = 10, vertex.label.cex = 0.5, edge.width = 0.5)[1] 428
[1] 476
****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.59245086403826e-10"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.1763740910028e-08"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00317565071129877"
[1] "Log-like diff: 6.45337854905392e-05"
[1] "Log-like diff: 1.32643040373637e-06"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 2.64517119319407e-08"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.68607192208015e-06"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 5.15270554295455e-05"
[1] "Log-like diff: 1.53693936582044e-05"
[1] "Log-like diff: 4.5935721573187e-06"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00616988683895769"
[1] "Log-like diff: 0.00126856869223246"
[1] "Log-like diff: 0.000260618616952257"
[1] "Log-like diff: 5.33343875019909e-05"
[1] "Log-like diff: 1.15885319260656e-05"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.0274883033141791"
[1] "Log-like diff: 0.0140091425551212"
[1] "Log-like diff: 0.00714786389296407"
[1] "Log-like diff: 0.0036493877764201"
[1] "Log-like diff: 0.00186364450210519"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00701190574808308"
[1] "Log-like diff: 0.000307827241659453"
[1] "Log-like diff: 1.41449596142706e-05"
[1] "Log-like diff: 6.48435655392632e-07"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.000139610639912746"
[1] "Log-like diff: 4.97193436999055e-06"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.000147263159224309"
[1] "Log-like diff: 2.61692528091828e-05"
[1] "Log-like diff: 4.66065253235115e-06"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -3013888
*************************************************
Number of clusters = 12
ICL = -3013888
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
7 121 158 122 35 32 49
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
24 23 103 109 54
Number of observations with MAP > 0.90 (% of total):
835 (99.76%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
7 121 157 122 35 32 48
(100%) (100%) (99.37%) (100%) (100%) (100%) (97.96%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
24 23 103 109 54
(100%) (100%) (100%) (100%) (100%)
a$cluster <- clusteredGenes[a$ensembl_gene_id]
entrezID <- list()
nb_clust = max(clusteredGenes)
for (clust in seq(1:nb_clust)) {
# print(entrez[entrez$cluster == clust,]$ensembl_transcript_id)
entrezID[[length(entrezID) + 1]] <- na.omit(a[a$cluster == clust, ]$entrezgene_id)
}
names(entrezID) <- as.character(seq(1:nb_clust))
ck <- compareCluster(geneCluster = entrezID, fun = "enrichGO", OrgDb = org.At.tair.db,
ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)
clusterProfiler::dotplot(ck, x = ~Cluster)Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
19.1712 3.3535 0.5249 0.3943 0.1205
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
79.880 13.973 2.187 1.643 0.502
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
79.88 93.85 96.04 97.68 98.18
(Only 5 dimensions (out of 24) are shown)
NULL
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
DEGenes <- sharedBy3
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.96
- Best model (greater ICL): rank = 10 - R2 = 0.96
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)log.data <- log2(normalized.count[sharedBy3, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
netStats(g) degree betweenness Rank_stat
AT4G32950 168 4195.902 684.50
AT2G30660 177 3893.654 682.00
AT5G39130 157 4167.511 680.00
AT1G74590 163 3818.345 679.00
AT5G54370 182 3334.164 677.00
AT3G44300 136 7640.805 673.50
AT3G16180 147 3766.436 671.00
AT1G52070 136 4848.893 669.00
AT5G24070 134 3813.655 659.00
AT2G46680 166 2202.663 658.50
AT4G23400 152 2417.920 658.25
AT2G31880 130 4797.134 657.50
AT1G69490 159 2110.001 654.50
AT2G30670 146 2258.923 653.00
AT1G20180 153 1838.916 644.00
AT2G21880 127 3149.782 643.50
AT5G06730 164 1721.279 643.00
AT5G38900 122 3819.292 639.00
AT1G01680 135 2184.533 638.75
AT5G46890 162 1569.691 633.00
AT1G22530 124 2724.852 632.50
AT5G22270 130 2131.515 629.00
AT4G25250 113 4488.276 628.00
AT4G27400 134 1838.540 627.00
AT1G25400 140 1671.199 626.25
[ reached 'max' / getOption("max.print") -- omitted 670 rows ]
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.108329
sparsifying penalty = 1.023732
sparsifying penalty = 0.945591
sparsifying penalty = 0.8734149
sparsifying penalty = 0.8067478
sparsifying penalty = 0.7451695
sparsifying penalty = 0.6882913
sparsifying penalty = 0.6357546
sparsifying penalty = 0.587228
sparsifying penalty = 0.5424054
sparsifying penalty = 0.5010041
sparsifying penalty = 0.4627629
sparsifying penalty = 0.4274406
sparsifying penalty = 0.3948144
sparsifying penalty = 0.3646786
sparsifying penalty = 0.336843
sparsifying penalty = 0.311132
sparsifying penalty = 0.2873836
sparsifying penalty = 0.2654478
sparsifying penalty = 0.2451864
sparsifying penalty = 0.2264716
sparsifying penalty = 0.2091852
sparsifying penalty = 0.1932183
sparsifying penalty = 0.1784701
sparsifying penalty = 0.1648476
sparsifying penalty = 0.1522649
sparsifying penalty = 0.1406427
sparsifying penalty = 0.1299075
sparsifying penalty = 0.1199918
sparsifying penalty = 0.1108329
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net)$color <- clusteredGenes[V(net)]
# V(net)$size <- degree[V(net)]*0.1
plot.igraph(net, vertex.size = 10, vertex.label.cex = 0.4, edge.width = 0.5) degree betweenness Rank_stat
AT1G52060 147 12082.4815 400.00
AT1G33830 108 9017.4939 399.00
AT1G21240 106 7948.1457 398.00
AT1G65680 60 2984.7983 395.75
AT4G12520 60 1869.5212 394.75
AT1G54970 73 1592.9211 394.50
AT1G07180 67 1234.9771 393.50
AT1G34047 36 2355.5299 392.50
AT1G15125 39 1698.4125 391.75
AT1G51830 56 573.9773 389.50
AT1G08630 29 1808.7133 388.50
AT5G43175 33 687.6660 387.50
AT4G36570 39 486.7330 387.25
AT1G51840 48 416.1389 387.00
AT1G10070 25 1005.6858 385.25
AT4G24350 29 626.8579 385.00
AT2G23270 25 658.6854 384.25
AT5G43590 34 276.8061 382.75
AT5G20790 30 323.3906 382.50
AT5G60530 34 266.6832 381.75
AT4G36850 22 444.7051 378.75
AT5G38260 23 210.5062 376.50
AT4G21260 29 178.3839 375.50
AT1G63590 20 315.2205 374.50
AT2G14560 22 187.1106 372.75
[ reached 'max' / getOption("max.print") -- omitted 375 rows ]
****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.07117870129514e-10"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.0758628365210647"
[1] "Log-like diff: 0.0198623147529577"
[1] "Log-like diff: 0.00525704058667209"
[1] "Log-like diff: 0.00139919103311215"
[1] "Log-like diff: 0.000341532013560908"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 911.703693209609"
[1] "Log-like diff: 10229.2505975374"
[1] "Log-like diff: 9950.3031575155"
[1] "Log-like diff: 5034.73628850093"
[1] "Log-like diff: 1475.0169888435"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.06389969189991e-06"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 61.6607030338482"
[1] "Log-like diff: 220.27999962944"
[1] "Log-like diff: 343.597475598017"
[1] "Log-like diff: 118.657798252968"
[1] "Log-like diff: 89.3377189676277"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 14.6494159794593"
[1] "Log-like diff: 3.6525242960117"
[1] "Log-like diff: 1.26888565034217"
[1] "Log-like diff: 0.318955235026404"
[1] "Log-like diff: 0.532606363059998"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 165.180418137196"
[1] "Log-like diff: 507.43334305175"
[1] "Log-like diff: 551.608104494467"
[1] "Log-like diff: 105.042908713928"
[1] "Log-like diff: 666.858741774391"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4816.71539334338"
[1] "Log-like diff: 3108.91901188656"
[1] "Log-like diff: 367.411796261148"
[1] "Log-like diff: 1344.87799550861"
[1] "Log-like diff: 945.884005906646"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 22.7534696536175"
[1] "Log-like diff: 57.7664628398479"
[1] "Log-like diff: 164.083048127435"
[1] "Log-like diff: 664.709681462122"
[1] "Log-like diff: 57.2317241648463"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 550.402239021098"
[1] "Log-like diff: 609.320300147832"
[1] "Log-like diff: 358.333277077012"
[1] "Log-like diff: 267.162691944079"
[1] "Log-like diff: 136.03943381312"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 546.958284346865"
[1] "Log-like diff: 300.898799235776"
[1] "Log-like diff: 208.76574849009"
[1] "Log-like diff: 279.930716172939"
[1] "Log-like diff: 1972.49873489241"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -3391000
*************************************************
Number of clusters = 12
ICL = -3391000
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
682 34 571 66 182 131 44
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
405 238 22 232 234
Number of observations with MAP > 0.90 (% of total):
2759 (97.11%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
670 34 565 61 173 126 39
(98.24%) (100%) (98.95%) (92.42%) (95.05%) (96.18%) (88.64%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
393 227 22 222 227
(97.04%) (95.38%) (100%) (95.69%) (97.01%)
# a <- OntologyProfile(sharedBy3) a$cluster <- clusteredGenes[a$ensembl_gene_id]
# entrezID <- list() nb_clust = max(clusteredGenes) for (clust in
# seq(1:nb_clust)) { # print(entrez[entrez$cluster ==
# clust,]$ensembl_transcript_id) entrezID[[length(entrezID) + 1]] <-
# na.omit(a[a$cluster == clust, ]$entrezgene_id) } names(entrezID) <-
# as.character(seq(1:nb_clust)) ck <- compareCluster(geneCluster = entrezID, fun
# = 'enrichGO', OrgDb = org.At.tair.db, ont = 'BP', pAdjustMethod = 'BH',
# pvalueCutoff = 0.01, qvalueCutoff = 0.05) clusterProfiler::dotplot(ck, x =
# ~Cluster)
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
DEGenes <- sharedBy3
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.95
- Best model (greater ICL): rank = 10 - R2 = 0.95
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
22.01676 1.12356 0.26987 0.11457 0.06979
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
91.7365 4.6815 1.1245 0.4774 0.2908
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
91.74 96.42 97.54 98.02 98.31
(Only 5 dimensions (out of 24) are shown)
NULL
sample_genes <- sample(sharedBy3, 400)
log.data <- log2(normalized.count[sharedBy3, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
netStats(g) degree betweenness Rank_stat
AT3G24750 408 29895.80 2422.50
AT3G59660 374 30464.74 2421.50
AT4G16920 464 24536.84 2418.00
AT5G55135 333 24067.60 2402.25
AT4G04490 385 16625.79 2392.50
AT1G49160 312 22683.14 2391.00
AT4G15563 304 23216.59 2388.75
AT1G77330 335 17580.56 2384.75
AT5G56320 349 14437.99 2373.75
AT3G62120 253 50151.01 2369.25
AT2G07050 246 64836.94 2368.50
AT1G09935 272 21354.01 2367.00
AT3G17770 332 13362.11 2359.50
AT2G17520 244 32437.83 2359.00
AT1G69030 291 15125.51 2358.25
AT5G51640 322 13584.21 2357.00
AT2G17220 371 12281.28 2356.75
AT5G51070 364 12367.83 2355.50
AT3G17700 370 12199.02 2354.00
AT5G24070 375 11794.52 2349.50
AT5G65810 297 13433.66 2347.75
AT5G19760 255 19783.74 2347.75
AT5G53350 223 40735.04 2346.25
AT2G31880 257 18948.39 2345.75
AT5G12900 280 14435.61 2345.25
[ reached 'max' / getOption("max.print") -- omitted 2411 rows ]
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.043274
sparsifying penalty = 0.9636421
sparsifying penalty = 0.8900881
sparsifying penalty = 0.8221485
sparsifying penalty = 0.7593946
sparsifying penalty = 0.7014306
sparsifying penalty = 0.647891
sparsifying penalty = 0.5984381
sparsifying penalty = 0.5527598
sparsifying penalty = 0.5105681
sparsifying penalty = 0.4715969
sparsifying penalty = 0.4356003
sparsifying penalty = 0.4023513
sparsifying penalty = 0.3716402
sparsifying penalty = 0.3432732
sparsifying penalty = 0.3170715
sparsifying penalty = 0.2928697
sparsifying penalty = 0.2705152
sparsifying penalty = 0.249867
sparsifying penalty = 0.2307949
sparsifying penalty = 0.2131785
sparsifying penalty = 0.1969067
sparsifying penalty = 0.181877
sparsifying penalty = 0.1679945
sparsifying penalty = 0.1551716
sparsifying penalty = 0.1433275
sparsifying penalty = 0.1323874
sparsifying penalty = 0.1222824
sparsifying penalty = 0.1129487
sparsifying penalty = 0.1043274
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
IGRAPH f69917a UNW- 400 778 --
+ attr: name (v/c), label (v/c), label.cex (v/n), size (v/n),
| label.color (v/c), frame.color (v/l), weight (e/n), color (e/c),
| width (e/n)
+ edges from f69917a (vertex names):
[1] AT1G49100--AT4G19750 AT1G49100--AT3G22235 AT1G49100--AT4G11490
[4] AT2G34910--AT4G10860 AT2G34910--AT1G76650 AT2G34910--AT5G52930
[7] AT2G34910--AT4G19750 AT2G34910--AT3G22235 AT2G34910--AT1G77120
[10] AT2G34910--AT1G47600 AT2G34910--AT5G06530 AT2G34910--AT2G14560
[13] AT2G34910--AT4G37220 AT2G34910--AT4G29570 AT2G30100--AT4G11490
[16] AT1G73805--AT4G10860 AT1G73805--AT5G06760 AT1G73805--AT3G55790
+ ... omitted several edges
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.043274
sparsifying penalty = 0.9636421
sparsifying penalty = 0.8900881
sparsifying penalty = 0.8221485
sparsifying penalty = 0.7593946
sparsifying penalty = 0.7014306
sparsifying penalty = 0.647891
sparsifying penalty = 0.5984381
sparsifying penalty = 0.5527598
sparsifying penalty = 0.5105681
sparsifying penalty = 0.4715969
sparsifying penalty = 0.4356003
sparsifying penalty = 0.4023513
sparsifying penalty = 0.3716402
sparsifying penalty = 0.3432732
sparsifying penalty = 0.3170715
sparsifying penalty = 0.2928697
sparsifying penalty = 0.2705152
sparsifying penalty = 0.249867
sparsifying penalty = 0.2307949
sparsifying penalty = 0.2131785
sparsifying penalty = 0.1969067
sparsifying penalty = 0.181877
sparsifying penalty = 0.1679945
sparsifying penalty = 0.1551716
sparsifying penalty = 0.1433275
sparsifying penalty = 0.1323874
sparsifying penalty = 0.1222824
sparsifying penalty = 0.1129487
sparsifying penalty = 0.1043274
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net)$color <- clusteredGenes[V(net)]
# V(net)$size <- degree[V(net)]*0.1
plot.igraph(net, vertex.size = 10, vertex.label.cex = 0.4, edge.width = 0.5) degree betweenness Rank_stat
AT4G11490 129 8188.89360 400.00
AT4G19750 83 2300.92369 398.50
AT3G22235 78 3166.88673 398.50
AT3G55790 68 1868.38062 397.00
AT5G06760 56 1463.77105 396.00
AT4G37220 39 1134.48775 394.50
AT1G12940 40 887.08129 394.00
AT5G66780 23 984.14734 391.50
AT2G14560 36 283.00607 391.50
AT1G13490 31 250.53060 389.50
AT1G47600 29 251.92000 389.50
AT4G15550 19 262.77394 387.50
AT4G10860 27 147.66193 386.50
AT3G21500 15 388.01341 385.00
AT3G42550 21 123.61836 385.00
AT4G23220 20 121.57303 383.00
AT1G70130 12 367.78839 381.75
AT4G01390 15 191.76261 381.50
AT5G52930 14 164.25780 379.75
AT1G71390 17 91.40846 378.75
AT1G01480 16 103.09378 378.50
AT4G25010 18 79.52183 378.00
AT5G51420 16 81.40382 376.50
AT5G25180 12 116.48613 374.75
AT5G55135 11 122.45205 374.00
[ reached 'max' / getOption("max.print") -- omitted 375 rows ]
Réseau CO2
La tomate semble répondre différemment dans certaines mesures : plus d’effet du CO2, effet moindre du fer, effet nitrate plutôt similaire.
Ontologies moins fournies pour la tomate
Réseau CO2
DEG en commun entre les 4 comparaisons possibles pour l’effet d’un facteur (Venn diagrams)
Fait sur l’ensemble des transcriptômes (plus large que les transcriptômes sur lesquels les DEG ont été détectés)
Relevance Network fait comme Rodrigo. et Al, seuil sur la valeur de corréation et sur la pvalue, gènes triés sur leur centralité et connectivité
Visualisés dans igraph après Clustering
Visualisés dans Cytoscape, pus clustering de communautés (pluggins) et analyse d’enrichissement d’ontologies
Réseau CO2
Regression: Transcription factors are selected by target gene specific sparse linear regression and data resampling approaches.
Bayesian networks optimize posterior probabilities by different heuristic searches.
Correlation Edges are ranked based on variants of correlation.
Méthodes mixtes Consensus entre différentes techniques (conclusion de la métanalyse et benchmarcks du projet DREAM5 (wisdom of crowds))
Others : Genie3: A random forest, neural networks, chi 2 - Mutual Information: Edges are (1) ranked based on variants of mutual information and (2) filtered for causal relationships.